Overview

Brought to you by YData

Dataset statistics

Number of variables13
Number of observations397884
Missing cells0
Missing cells (%)0.0%
Duplicate rows4835
Duplicate rows (%)1.2%
Total size in memory36.4 MiB
Average record size in memory96.0 B

Variable types

Numeric8
Unsupported1
Text1
DateTime1
Categorical2

Alerts

Dataset has 4835 (1.2%) duplicate rowsDuplicates
Country is highly overall correlated with InvoiceNoHigh correlation
InvoiceNo is highly overall correlated with Country and 2 other fieldsHigh correlation
Month is highly overall correlated with InvoiceNoHigh correlation
Quantity is highly overall correlated with TotalAmountHigh correlation
TotalAmount is highly overall correlated with QuantityHigh correlation
Year is highly overall correlated with InvoiceNoHigh correlation
Country is highly imbalanced (82.7%) Imbalance
Year is highly imbalanced (65.0%) Imbalance
Quantity is highly skewed (γ1 = 409.8929717) Skewed
UnitPrice is highly skewed (γ1 = 204.0327268) Skewed
TotalAmount is highly skewed (γ1 = 451.4431818) Skewed
StockCode is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2024-12-15 09:31:21.619380
Analysis finished2024-12-15 09:31:38.381909
Duration16.76 seconds
Software versionydata-profiling vv4.12.1
Download configurationconfig.json

Variables

InvoiceNo
Real number (ℝ)

High correlation 

Distinct18532
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean560616.93
Minimum536365
Maximum581587
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 MiB
2024-12-15T15:01:38.444522image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Quantile statistics

Minimum536365
5-th percentile538863
Q1549234
median561893
Q3572090
95-th percentile579493
Maximum581587
Range45222
Interquartile range (IQR)22856

Descriptive statistics

Standard deviation13106.118
Coefficient of variation (CV)0.023378027
Kurtosis-1.200748
Mean560616.93
Median Absolute Deviation (MAD)11266
Skewness-0.17852408
Sum2.2306051 × 1011
Variance1.7177032 × 108
MonotonicityNot monotonic
2024-12-15T15:01:38.555953image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
576339 542
 
0.1%
579196 533
 
0.1%
580727 529
 
0.1%
578270 442
 
0.1%
573576 435
 
0.1%
567656 421
 
0.1%
567183 399
 
0.1%
575607 377
 
0.1%
571441 364
 
0.1%
570488 353
 
0.1%
Other values (18522) 393489
98.9%
ValueCountFrequency (%)
536365 7
 
< 0.1%
536366 2
 
< 0.1%
536367 12
< 0.1%
536368 4
 
< 0.1%
536369 1
 
< 0.1%
536370 20
< 0.1%
536371 1
 
< 0.1%
536372 2
 
< 0.1%
536373 16
< 0.1%
536374 1
 
< 0.1%
ValueCountFrequency (%)
581587 15
 
< 0.1%
581586 4
 
< 0.1%
581585 21
< 0.1%
581584 2
 
< 0.1%
581583 2
 
< 0.1%
581582 2
 
< 0.1%
581581 3
 
< 0.1%
581580 24
< 0.1%
581579 30
< 0.1%
581578 38
< 0.1%

StockCode
Unsupported

Rejected  Unsupported 

Missing0
Missing (%)0.0%
Memory size6.1 MiB
Distinct3877
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size6.1 MiB
2024-12-15T15:01:38.951567image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Length

Max length35
Median length28
Mean length26.677454
Min length6

Characters and Unicode

Total characters10614532
Distinct characters68
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique213 ?
Unique (%)0.1%

Sample

1st rowWHITE HANGING HEART T-LIGHT HOLDER
2nd rowWHITE METAL LANTERN
3rd rowCREAM CUPID HEARTS COAT HANGER
4th rowKNITTED UNION FLAG HOT WATER BOTTLE
5th rowRED WOOLLY HOTTIE WHITE HEART.
ValueCountFrequency (%)
of 40804
 
2.3%
set 40719
 
2.3%
bag 37774
 
2.2%
red 31813
 
1.8%
heart 29307
 
1.7%
retrospot 26336
 
1.5%
vintage 25579
 
1.5%
design 23519
 
1.3%
pink 20142
 
1.2%
christmas 19057
 
1.1%
Other values (2179) 1453890
83.1%
2024-12-15T15:01:39.458957image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1453261
13.7%
E 951356
 
9.0%
A 806740
 
7.6%
T 705588
 
6.6%
R 678203
 
6.4%
O 634322
 
6.0%
I 580299
 
5.5%
S 571883
 
5.4%
N 528406
 
5.0%
L 519172
 
4.9%
Other values (58) 3185302
30.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 10614532
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1453261
13.7%
E 951356
 
9.0%
A 806740
 
7.6%
T 705588
 
6.6%
R 678203
 
6.4%
O 634322
 
6.0%
I 580299
 
5.5%
S 571883
 
5.4%
N 528406
 
5.0%
L 519172
 
4.9%
Other values (58) 3185302
30.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 10614532
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1453261
13.7%
E 951356
 
9.0%
A 806740
 
7.6%
T 705588
 
6.6%
R 678203
 
6.4%
O 634322
 
6.0%
I 580299
 
5.5%
S 571883
 
5.4%
N 528406
 
5.0%
L 519172
 
4.9%
Other values (58) 3185302
30.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 10614532
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1453261
13.7%
E 951356
 
9.0%
A 806740
 
7.6%
T 705588
 
6.6%
R 678203
 
6.4%
O 634322
 
6.0%
I 580299
 
5.5%
S 571883
 
5.4%
N 528406
 
5.0%
L 519172
 
4.9%
Other values (58) 3185302
30.0%

Quantity
Real number (ℝ)

High correlation  Skewed 

Distinct301
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.988238
Minimum1
Maximum80995
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 MiB
2024-12-15T15:01:39.584238image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median6
Q312
95-th percentile36
Maximum80995
Range80994
Interquartile range (IQR)10

Descriptive statistics

Standard deviation179.33177
Coefficient of variation (CV)13.807245
Kurtosis178186.24
Mean12.988238
Median Absolute Deviation (MAD)5
Skewness409.89297
Sum5167812
Variance32159.886
MonotonicityNot monotonic
2024-12-15T15:01:39.708676image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 73301
18.4%
12 60031
15.1%
2 57999
14.6%
6 37688
9.5%
4 32180
8.1%
3 26948
 
6.8%
24 23748
 
6.0%
10 21212
 
5.3%
8 11644
 
2.9%
5 8148
 
2.0%
Other values (291) 44985
11.3%
ValueCountFrequency (%)
1 73301
18.4%
2 57999
14.6%
3 26948
 
6.8%
4 32180
8.1%
5 8148
 
2.0%
6 37688
9.5%
7 1299
 
0.3%
8 11644
 
2.9%
9 1170
 
0.3%
10 21212
 
5.3%
ValueCountFrequency (%)
80995 1
< 0.1%
74215 1
< 0.1%
4800 1
< 0.1%
4300 1
< 0.1%
3906 1
< 0.1%
3186 1
< 0.1%
3114 2
< 0.1%
3000 1
< 0.1%
2880 2
< 0.1%
2700 1
< 0.1%
Distinct17282
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size6.1 MiB
Minimum2010-12-01 08:26:00
Maximum2011-12-09 12:50:00
Invalid dates0
Invalid dates (%)0.0%
2024-12-15T15:01:39.828736image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:39.957793image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

UnitPrice
Real number (ℝ)

Skewed 

Distinct440
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.1164878
Minimum0.001
Maximum8142.75
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 MiB
2024-12-15T15:01:40.085906image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Quantile statistics

Minimum0.001
5-th percentile0.42
Q11.25
median1.95
Q33.75
95-th percentile8.5
Maximum8142.75
Range8142.749
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation22.097877
Coefficient of variation (CV)7.0906348
Kurtosis58140.397
Mean3.1164878
Median Absolute Deviation (MAD)1.1
Skewness204.03273
Sum1240000.6
Variance488.31615
MonotonicityNot monotonic
2024-12-15T15:01:40.208456image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.25 45841
 
11.5%
1.65 36834
 
9.3%
2.95 26562
 
6.7%
0.85 25968
 
6.5%
0.42 21812
 
5.5%
4.95 18122
 
4.6%
3.75 17676
 
4.4%
2.1 17190
 
4.3%
2.08 15745
 
4.0%
1.95 12677
 
3.2%
Other values (430) 159457
40.1%
ValueCountFrequency (%)
0.001 4
 
< 0.1%
0.04 66
 
< 0.1%
0.06 112
 
< 0.1%
0.07 7
 
< 0.1%
0.08 55
 
< 0.1%
0.09 2
 
< 0.1%
0.1 53
 
< 0.1%
0.12 635
0.2%
0.14 87
 
< 0.1%
0.16 45
 
< 0.1%
ValueCountFrequency (%)
8142.75 1
< 0.1%
4161.06 2
< 0.1%
3949.32 1
< 0.1%
3155.95 1
< 0.1%
2500 1
< 0.1%
2382.92 1
< 0.1%
2118.74 1
< 0.1%
2053.07 1
< 0.1%
2033.1 1
< 0.1%
1867.86 1
< 0.1%

CustomerID
Real number (ℝ)

Distinct4338
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15294.423
Minimum12346
Maximum18287
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 MiB
2024-12-15T15:01:40.325587image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Quantile statistics

Minimum12346
5-th percentile12627
Q113969
median15159
Q316795
95-th percentile17912
Maximum18287
Range5941
Interquartile range (IQR)2826

Descriptive statistics

Standard deviation1713.1416
Coefficient of variation (CV)0.11201086
Kurtosis-1.180822
Mean15294.423
Median Absolute Deviation (MAD)1479
Skewness0.025728933
Sum6.0854064 × 109
Variance2934854
MonotonicityNot monotonic
2024-12-15T15:01:40.438557image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17841 7847
 
2.0%
14911 5675
 
1.4%
14096 5111
 
1.3%
12748 4595
 
1.2%
14606 2700
 
0.7%
15311 2379
 
0.6%
14646 2076
 
0.5%
13089 1818
 
0.5%
13263 1677
 
0.4%
14298 1637
 
0.4%
Other values (4328) 362369
91.1%
ValueCountFrequency (%)
12346 1
 
< 0.1%
12347 182
< 0.1%
12348 31
 
< 0.1%
12349 73
< 0.1%
12350 17
 
< 0.1%
12352 85
< 0.1%
12353 4
 
< 0.1%
12354 58
 
< 0.1%
12355 13
 
< 0.1%
12356 59
 
< 0.1%
ValueCountFrequency (%)
18287 70
 
< 0.1%
18283 756
0.2%
18282 12
 
< 0.1%
18281 7
 
< 0.1%
18280 10
 
< 0.1%
18278 9
 
< 0.1%
18277 8
 
< 0.1%
18276 14
 
< 0.1%
18274 11
 
< 0.1%
18273 3
 
< 0.1%

Country
Categorical

High correlation  Imbalance 

Distinct37
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.1 MiB
United Kingdom
354321 
Germany
 
9040
France
 
8341
EIRE
 
7236
Spain
 
2484
Other values (32)
 
16462

Length

Max length20
Median length14
Mean length13.204638
Min length3

Characters and Unicode

Total characters5253914
Distinct characters40
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnited Kingdom
2nd rowUnited Kingdom
3rd rowUnited Kingdom
4th rowUnited Kingdom
5th rowUnited Kingdom

Common Values

ValueCountFrequency (%)
United Kingdom 354321
89.1%
Germany 9040
 
2.3%
France 8341
 
2.1%
EIRE 7236
 
1.8%
Spain 2484
 
0.6%
Netherlands 2359
 
0.6%
Belgium 2031
 
0.5%
Switzerland 1841
 
0.5%
Portugal 1462
 
0.4%
Australia 1182
 
0.3%
Other values (27) 7587
 
1.9%

Length

2024-12-15T15:01:40.568630image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united 354389
47.1%
kingdom 354321
47.0%
germany 9040
 
1.2%
france 8341
 
1.1%
eire 7236
 
1.0%
spain 2484
 
0.3%
netherlands 2359
 
0.3%
belgium 2031
 
0.3%
switzerland 1841
 
0.2%
portugal 1462
 
0.2%
Other values (33) 9679
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n 738932
14.1%
i 718331
13.7%
d 715710
13.6%
e 384188
7.3%
m 365960
7.0%
t 362664
6.9%
g 358036
6.8%
o 357571
6.8%
355299
6.8%
U 354812
6.8%
Other values (30) 542411
10.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5253914
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 738932
14.1%
i 718331
13.7%
d 715710
13.6%
e 384188
7.3%
m 365960
7.0%
t 362664
6.9%
g 358036
6.8%
o 357571
6.8%
355299
6.8%
U 354812
6.8%
Other values (30) 542411
10.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5253914
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 738932
14.1%
i 718331
13.7%
d 715710
13.6%
e 384188
7.3%
m 365960
7.0%
t 362664
6.9%
g 358036
6.8%
o 357571
6.8%
355299
6.8%
U 354812
6.8%
Other values (30) 542411
10.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5253914
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 738932
14.1%
i 718331
13.7%
d 715710
13.6%
e 384188
7.3%
m 365960
7.0%
t 362664
6.9%
g 358036
6.8%
o 357571
6.8%
355299
6.8%
U 354812
6.8%
Other values (30) 542411
10.3%

Hour
Real number (ℝ)

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.728202
Minimum6
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.6 MiB
2024-12-15T15:01:40.681724image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile9
Q111
median13
Q314
95-th percentile17
Maximum20
Range14
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.2735189
Coefficient of variation (CV)0.17862058
Kurtosis-0.20969555
Mean12.728202
Median Absolute Deviation (MAD)2
Skewness0.18902901
Sum5064348
Variance5.1688881
MonotonicityNot monotonic
2024-12-15T15:01:40.781274image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
12 72065
18.1%
13 64026
16.1%
14 54118
13.6%
11 49084
12.3%
15 45369
11.4%
10 37997
9.5%
16 24089
 
6.1%
9 21944
 
5.5%
17 13071
 
3.3%
8 8690
 
2.2%
Other values (5) 7431
 
1.9%
ValueCountFrequency (%)
6 1
 
< 0.1%
7 379
 
0.1%
8 8690
 
2.2%
9 21944
 
5.5%
10 37997
9.5%
11 49084
12.3%
12 72065
18.1%
13 64026
16.1%
14 54118
13.6%
15 45369
11.4%
ValueCountFrequency (%)
20 802
 
0.2%
19 3321
 
0.8%
18 2928
 
0.7%
17 13071
 
3.3%
16 24089
 
6.1%
15 45369
11.4%
14 54118
13.6%
13 64026
16.1%
12 72065
18.1%
11 49084
12.3%

Day
Real number (ℝ)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.042186
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.6 MiB
2024-12-15T15:01:40.886801image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q17
median15
Q322
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.6537465
Coefficient of variation (CV)0.57529848
Kurtosis-1.1728432
Mean15.042186
Median Absolute Deviation (MAD)8
Skewness0.11448166
Sum5985045
Variance74.887328
MonotonicityNot monotonic
2024-12-15T15:01:41.003669image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
6 18346
 
4.6%
5 16409
 
4.1%
8 15854
 
4.0%
7 15601
 
3.9%
17 14912
 
3.7%
4 14880
 
3.7%
20 14667
 
3.7%
23 14290
 
3.6%
13 14172
 
3.6%
14 14165
 
3.6%
Other values (21) 244588
61.5%
ValueCountFrequency (%)
1 13629
3.4%
2 12101
3.0%
3 10875
2.7%
4 14880
3.7%
5 16409
4.1%
6 18346
4.6%
7 15601
3.9%
8 15854
4.0%
9 12947
3.3%
10 14072
3.5%
ValueCountFrequency (%)
31 6770
1.7%
30 10034
2.5%
29 8137
2.0%
28 13509
3.4%
27 12432
3.1%
26 8710
2.2%
25 12008
3.0%
24 12086
3.0%
23 14290
3.6%
22 12403
3.1%

Month
Real number (ℝ)

High correlation 

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.612475
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.6 MiB
2024-12-15T15:01:41.113940image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median8
Q311
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.4165196
Coefficient of variation (CV)0.44880536
Kurtosis-1.0744883
Mean7.612475
Median Absolute Deviation (MAD)3
Skewness-0.44480255
Sum3028882
Variance11.672606
MonotonicityNot monotonic
2024-12-15T15:01:41.226207image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
11 64531
16.2%
10 49554
12.5%
12 43461
10.9%
9 40028
10.1%
5 28320
7.1%
6 27185
6.8%
3 27175
6.8%
8 27007
6.8%
7 26825
6.7%
4 22642
 
5.7%
Other values (2) 41156
10.3%
ValueCountFrequency (%)
1 21229
5.3%
2 19927
5.0%
3 27175
6.8%
4 22642
5.7%
5 28320
7.1%
6 27185
6.8%
7 26825
6.7%
8 27007
6.8%
9 40028
10.1%
10 49554
12.5%
ValueCountFrequency (%)
12 43461
10.9%
11 64531
16.2%
10 49554
12.5%
9 40028
10.1%
8 27007
6.8%
7 26825
6.7%
6 27185
6.8%
5 28320
7.1%
4 22642
 
5.7%
3 27175
6.8%

Year
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.1 MiB
2011
371727 
2010
 
26157

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1591536
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2010
2nd row2010
3rd row2010
4th row2010
5th row2010

Common Values

ValueCountFrequency (%)
2011 371727
93.4%
2010 26157
 
6.6%

Length

2024-12-15T15:01:41.361157image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-15T15:01:41.473985image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
ValueCountFrequency (%)
2011 371727
93.4%
2010 26157
 
6.6%

Most occurring characters

ValueCountFrequency (%)
1 769611
48.4%
0 424041
26.6%
2 397884
25.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1591536
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 769611
48.4%
0 424041
26.6%
2 397884
25.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1591536
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 769611
48.4%
0 424041
26.6%
2 397884
25.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1591536
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 769611
48.4%
0 424041
26.6%
2 397884
25.0%

TotalAmount
Real number (ℝ)

High correlation  Skewed 

Distinct2939
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.397
Minimum0.001
Maximum168469.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 MiB
2024-12-15T15:01:41.590588image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Quantile statistics

Minimum0.001
5-th percentile1.25
Q14.68
median11.8
Q319.8
95-th percentile67.5
Maximum168469.6
Range168469.6
Interquartile range (IQR)15.12

Descriptive statistics

Standard deviation309.07104
Coefficient of variation (CV)13.799663
Kurtosis232155.12
Mean22.397
Median Absolute Deviation (MAD)7.55
Skewness451.44318
Sum8911407.9
Variance95524.909
MonotonicityNot monotonic
2024-12-15T15:01:41.723599image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15 20082
 
5.0%
17.7 9174
 
2.3%
16.5 8490
 
2.1%
10.2 8028
 
2.0%
19.8 7625
 
1.9%
1.25 7552
 
1.9%
3.75 6847
 
1.7%
1.65 5751
 
1.4%
10.5 5550
 
1.4%
20.8 5524
 
1.4%
Other values (2929) 313261
78.7%
ValueCountFrequency (%)
0.001 4
 
< 0.1%
0.06 1
 
< 0.1%
0.08 1
 
< 0.1%
0.1 3
 
< 0.1%
0.12 24
 
< 0.1%
0.14 4
 
< 0.1%
0.16 1
 
< 0.1%
0.18 2
 
< 0.1%
0.19 95
< 0.1%
0.21 109
< 0.1%
ValueCountFrequency (%)
168469.6 1
< 0.1%
77183.6 1
< 0.1%
38970 1
< 0.1%
8142.75 1
< 0.1%
7144.72 1
< 0.1%
6539.4 2
< 0.1%
4992 1
< 0.1%
4921.5 1
< 0.1%
4632 1
< 0.1%
4522.5 1
< 0.1%

Interactions

2024-12-15T15:01:36.240031image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:26.850082image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:28.379638image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:29.743343image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:30.990801image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:32.158089image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:33.391918image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:35.033359image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:36.379093image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:27.018205image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:28.605680image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:29.885200image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:31.122884image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:32.293048image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:33.558407image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:35.181921image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:36.539154image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:27.218779image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:28.792424image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:30.057099image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:31.265709image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:32.439567image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:33.748264image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:35.335152image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:36.689197image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:27.404756image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:28.961647image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:30.210973image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:31.411562image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:32.595584image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:33.975390image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:35.484763image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:36.826810image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:27.579082image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:29.118649image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:30.364651image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:31.548886image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:32.729471image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:34.173112image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:35.617604image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:36.973842image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:27.756432image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:29.280087image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:30.509301image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:31.687702image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:32.870494image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:34.422803image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:35.754084image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:37.117515image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:27.936208image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:29.425173image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:30.648235image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:31.829320image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:33.013535image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:34.616011image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:35.899188image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:37.249124image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:28.117392image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:29.561921image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:30.788912image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:31.970429image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:33.171141image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:34.775555image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
2024-12-15T15:01:36.046333image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/

Correlations

2024-12-15T15:01:41.812619image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
CountryCustomerIDDayHourInvoiceNoMonthQuantityTotalAmountUnitPriceYear
Country1.0000.3010.0620.0860.9760.0650.0000.0000.0420.055
CustomerID0.3011.000-0.0030.0570.0010.033-0.154-0.174-0.0120.054
Day0.062-0.0031.0000.0130.086-0.1480.0040.001-0.0040.188
Hour0.0860.0570.0131.0000.0520.060-0.151-0.168-0.0080.037
InvoiceNo0.9760.0010.0860.0521.0000.641-0.034-0.067-0.0470.976
Month0.0650.033-0.1480.0600.6411.000-0.057-0.073-0.0210.435
Quantity0.000-0.1540.004-0.151-0.034-0.0571.0000.657-0.4080.000
TotalAmount0.000-0.1740.001-0.168-0.067-0.0730.6571.0000.3490.000
UnitPrice0.042-0.012-0.004-0.008-0.047-0.021-0.4080.3491.0000.000
Year0.0550.0540.1880.0370.9760.4350.0000.0000.0001.000

Missing values

2024-12-15T15:01:37.423942image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
A simple visualization of nullity by column.
2024-12-15T15:01:37.817399image/svg+xmlMatplotlib v3.9.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryHourDayMonthYearTotalAmount
053636585123AWHITE HANGING HEART T-LIGHT HOLDER62010-12-01 08:26:002.5517850.0United Kingdom8112201015.30
153636571053WHITE METAL LANTERN62010-12-01 08:26:003.3917850.0United Kingdom8112201020.34
253636584406BCREAM CUPID HEARTS COAT HANGER82010-12-01 08:26:002.7517850.0United Kingdom8112201022.00
353636584029GKNITTED UNION FLAG HOT WATER BOTTLE62010-12-01 08:26:003.3917850.0United Kingdom8112201020.34
453636584029ERED WOOLLY HOTTIE WHITE HEART.62010-12-01 08:26:003.3917850.0United Kingdom8112201020.34
553636522752SET 7 BABUSHKA NESTING BOXES22010-12-01 08:26:007.6517850.0United Kingdom8112201015.30
653636521730GLASS STAR FROSTED T-LIGHT HOLDER62010-12-01 08:26:004.2517850.0United Kingdom8112201025.50
753636622633HAND WARMER UNION JACK62010-12-01 08:28:001.8517850.0United Kingdom8112201011.10
853636622632HAND WARMER RED POLKA DOT62010-12-01 08:28:001.8517850.0United Kingdom8112201011.10
953636784879ASSORTED COLOUR BIRD ORNAMENT322010-12-01 08:34:001.6913047.0United Kingdom8112201054.08
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryHourDayMonthYearTotalAmount
54189958158722726ALARM CLOCK BAKELIKE GREEN42011-12-09 12:50:003.7512680.0France12912201115.00
54190058158722730ALARM CLOCK BAKELIKE IVORY42011-12-09 12:50:003.7512680.0France12912201115.00
54190158158722367CHILDRENS APRON SPACEBOY DESIGN82011-12-09 12:50:001.9512680.0France12912201115.60
54190258158722629SPACEBOY LUNCH BOX122011-12-09 12:50:001.9512680.0France12912201123.40
54190358158723256CHILDRENS CUTLERY SPACEBOY42011-12-09 12:50:004.1512680.0France12912201116.60
54190458158722613PACK OF 20 SPACEBOY NAPKINS122011-12-09 12:50:000.8512680.0France12912201110.20
54190558158722899CHILDREN'S APRON DOLLY GIRL62011-12-09 12:50:002.1012680.0France12912201112.60
54190658158723254CHILDRENS CUTLERY DOLLY GIRL42011-12-09 12:50:004.1512680.0France12912201116.60
54190758158723255CHILDRENS CUTLERY CIRCUS PARADE42011-12-09 12:50:004.1512680.0France12912201116.60
54190858158722138BAKING SET 9 PIECE RETROSPOT32011-12-09 12:50:004.9512680.0France12912201114.85

Duplicate rows

Most frequently occurring

InvoiceNoDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryHourDayMonthYearTotalAmount# duplicates
1612555524PINK REGENCY TEACUP AND SAUCER12011-06-05 11:37:002.9516923.0United Kingdom115620112.9520
1611555524GREEN REGENCY TEACUP AND SAUCER12011-06-05 11:37:002.9516923.0United Kingdom115620112.9512
3194572861PURPLE DRAWERKNOB ACRYLIC EDWARDIAN122011-10-26 12:46:001.2514102.0United Kingdom122610201115.008
345538514BATH BUILDING BLOCK WORD12010-12-12 14:27:005.9515044.0United Kingdom14121220105.956
478540524BATH BUILDING BLOCK WORD12011-01-09 12:53:005.9516735.0United Kingdom129120115.956
535541266HOME BUILDING BLOCK WORD12011-01-16 16:25:005.9515673.0United Kingdom1616120115.956
536541266LOVE BUILDING BLOCK WORD12011-01-16 16:25:005.9515673.0United Kingdom1616120115.956
1089547651METAL SIGN,CUPCAKE SINGLE HOOK12011-03-24 12:11:001.2516904.0United Kingdom1224320111.256
3140572344Manual482011-10-24 10:43:001.5014607.0United Kingdom102410201172.006
4233578289BELLE JARDINIERE CUSHION COVER12011-11-23 14:07:003.7517841.0United Kingdom14231120113.756